Effective Incorporation of Source Syntax into Hierarchical Phrase-based Translation
نویسندگان
چکیده
In this paper we explicitly consider source language syntactic information in both rule extraction and decoding for hierarchical phrase-based translation. We obtain tree-to-string rules by the GHKM method and use them to complement Hiero-style rules. All these rules are then employed to decode new sentences with source language parse trees. We experiment with our approach in a state-of-the-art Chinese-English system and demonstrate +1.2 and +0.8 BLEU improvements on the NIST newswire and web evaluation data of MT08 and MT12.
منابع مشابه
A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation
Despite many differences between phrase-based, hierarchical, and syntax-based translation models, their training and testing pipelines are strikingly similar. Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit with end-to-end support for all three of these popular models in a single package. This extension su...
متن کاملPhrasal: A Toolkit for Statistical Machine Translation with Facilities for Extraction and Incorporation of Arbitrary Model Features
We present a new Java-based open source toolkit for phrase-based machine translation. The key innovation provided by the toolkit is to use APIs for integrating new features (/knowledge sources) into the decoding model and for extracting feature statistics from aligned bitexts. The package was used to develop a number of useful features written to these APIs including features for hierarchical r...
متن کاملImproving statistical machine translation with linguistic information
Statistical machine translation (SMT) should benefit from linguistic information to improve performance but current state-of-the-art models rely purely on data-driven models. There are several reasons why prior efforts to build linguistically annotated models have failed or not even been attempted. Firstly, the practical implementation often requires too much work to be cost effective. Where ad...
متن کاملNew Parameterizations and Features for PSCFG-Based Machine Translation
We propose several improvements to the hierarchical phrase-based MT model of Chiang (2005) and its syntax-based extension by Zollmann and Venugopal (2006). We add a source-span variance model that, for each rule utilized in a probabilistic synchronous context-free grammar (PSCFG) derivation, gives a confidence estimate in the rule based on the number of source words spanned by the rule and its ...
متن کاملPost-ordering in Statistical Machine Translation
In the field of staistical machine translation (SMT), pre-ordering is a recently attractive approach that reorders source language words into the target language order prior to SMT decoding. It is effective for long-distance reordering in SMT, especially between languages with distant word ordering like English and Japanese. Its key idea is to decompose the SMT problem into two subproblems of t...
متن کامل